Contributions of Jitter and Shimmer in the Voice for Fake Audio Detection
نویسندگان
چکیده
Fake audio detection (FAD) aims to identify fraudulent speech generated through advanced speech-synthesis techniques. Most current FAD methods rely solely on a deep neural network (DNN) framework with either waveforms or commonly used acoustic features extract high-level representations, overlooking the analysis of prosody differences between genuine and fake speech. Prosody carries important cues about naturalness emotional content, which can be leveraged in audio. This paper explicitly investigates information represented by jitter shimmer features. On basis our investigation, we found strong evidence that obvious exist level real speech, particularly feature has large dynamic variation for To ensure accurate estimation F 0 better propose using two additional methods, YIN SWIPE, place IRAPT algorithm extraction process. Moreover, design DNN-FAD system combining Mel-spectrogram The effectiveness proposed method is evaluated datasets Audio Deep Synthesis Detection (ADD) 2022 2023 challenges. experimental results show both static continuous features, especially extracted SWIPE algorithms, provide complementary knowledge traditional spectrum-based systems. optimal effectively reduce equal error rate from 41.29 % 35.77 ADD2023 challenge, achieving relative improvement 13.37 %.
منابع مشابه
Jitter, Shimmer, and Noise in Pathological Voice Quality Perception
Although jitter, shimmer, and turbulent noise characterize all voice signals, their perceptual importance has not been established psychoacoustically. To determine which of these acoustic attributes is important in listeners’ perceptions of pathologic voices, listeners used a speech synthesizer to adjust levels of jitter, shimmer, and noise so that synthetic voices matched natural pathological ...
متن کاملJitter and shimmer measurements for speaker recognition
Jitter and shimmer are measures of the cycle-to-cycle variations of fundamental frequency and amplitude, respectively, which have been largely used for the description of pathological voice quality. Since they characterise some aspects concerning particular voices, it is a priori expected to find differences in the values of jitter and shimmer among speakers. In this paper, several types of jit...
متن کاملUsing Jitter and Shimmer in speaker verification
Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used ...
متن کاملthe search for the self in becketts theatre: waiting for godot and endgame
this thesis is based upon the works of samuel beckett. one of the greatest writers of contemporary literature. here, i have tried to focus on one of the main themes in becketts works: the search for the real "me" or the real self, which is not only a problem to be solved for beckett man but also for each of us. i have tried to show becketts techniques in approaching this unattainable goal, base...
15 صفحه اولstudy of cohesive devices in the textbook of english for the students of apsychology by rastegarpour
this study investigates the cohesive devices used in the textbook of english for the students of psychology. the research questions and hypotheses in the present study are based on what frequency and distribution of grammatical and lexical cohesive devices are. then, to answer the questions all grammatical and lexical cohesive devices in reading comprehension passages from 6 units of 21units th...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3301616